Search CORE

50 research outputs found

Convolutional Embedding for Edit Distance

Author: Cheng James
Dai Xinyan
Wang Yuxuan
Yan Xiao
Yang Han
Zhou Kaiwen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/05/2020
Field of study

Edit-distance-based string similarity search has many applications such as spell correction, data de-duplication, and sequence alignment. However, computing edit distance is known to have high complexity, which makes string similarity search challenging for large datasets. In this paper, we propose a deep learning pipeline (called CNN-ED) that embeds edit distance into Euclidean distance for fast approximate similarity search. A convolutional neural network (CNN) is used to generate fixed-length vector embeddings for a dataset of strings and the loss function is a combination of the triplet loss and the approximation error. To justify our choice of using CNN instead of other structures (e.g., RNN) as the model, theoretical analysis is conducted to show that some basic operations in our CNN model preserve edit distance. Experimental results show that CNN-ED outperforms data-independent CGK embedding and RNN-based GRU embedding in terms of both accuracy and efficiency by a large margin. We also show that string similarity search can be significantly accelerated using CNN-based embeddings, sometimes by orders of magnitude.Comment: Accepted by the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 202

arXiv.org e-Print Archive

Crossref

Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search

Author: Cheng James
Dai Xinyan
Liu Jie
Ng Kelvin K. W.
Yan Xiao
Publication venue
Publication date: 20/11/2019
Field of study

Vector quantization (VQ) techniques are widely used in similarity search for data compression, fast metric computation and etc. Originally designed for Euclidean distance, existing VQ techniques (e.g., PQ, AQ) explicitly or implicitly minimize the quantization error. In this paper, we present a new angle to analyze the quantization error, which decomposes the quantization error into norm error and direction error. We show that quantization errors in norm have much higher influence on inner products than quantization errors in direction, and small quantization error does not necessarily lead to good performance in maximum inner product search (MIPS). Based on this observation, we propose norm-explicit quantization (NEQ) --- a general paradigm that improves existing VQ techniques for MIPS. NEQ quantizes the norms of items in a dataset explicitly to reduce errors in norm, which is crucial for MIPS. For the direction vectors, NEQ can simply reuse an existing VQ technique to quantize them without modification. We conducted extensive experiments on a variety of datasets and parameter configurations. The experimental results show that NEQ improves the performance of various VQ techniques for MIPS, including PQ, OPQ, RQ and AQ

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Understanding and Improving Proximity Graph based Maximum Inner Product Search

Author: Cheng James
Dai Xinyan
Li Zhirong
Liu Jie
Yan Xiao
Yang Ming-Chang
Publication venue
Publication date: 09/12/2019
Field of study

The inner-product navigable small world graph (ip-NSW) represents the state-of-the-art method for approximate maximum inner product search (MIPS) and it can achieve an order of magnitude speedup over the fastest baseline. However, to date it is still unclear where its exceptional performance comes from. In this paper, we show that there is a strong norm bias in the MIPS problem, which means that the large norm items are very likely to become the result of MIPS. Then we explain the good performance of ip-NSW as matching the norm bias of the MIPS problem - large norm items have big in-degrees in the ip-NSW proximity graph and a walk on the graph spends the majority of computation on these items, thus effectively avoids unnecessary computation on small norm items. Furthermore, we propose the ip-NSW+ algorithm, which improves ip-NSW by introducing an additional angular proximity graph. Search is first conducted on the angular graph to find the angular neighbors of a query and then the MIPS neighbors of these angular neighbors are used to initialize the candidate pool for search on the inner-product proximity graph. Experiment results show that ip-NSW+ consistently and significantly outperforms ip-NSW and provides more robust performance under different data distributions.Comment: 8 pages, 8 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models

Author: Chen Jitong
Dai Junyu
Guo Dong
Han Bing
Hao Weituo
He Xinyan
Qian Yanmin
Song Xuchen
Wang Yuxuan
Publication venue
Publication date: 06/09/2023
Field of study

Music editing primarily entails the modification of instrument tracks or remixing in the whole, which offers a novel reinterpretation of the original piece through a series of operations. These music processing methods hold immense potential across various applications but demand substantial expertise. Prior methodologies, although effective for image and audio modifications, falter when directly applied to music. This is attributed to music's distinctive data nature, where such methods can inadvertently compromise the intrinsic harmony and coherence of music. In this paper, we develop InstructME, an Instruction guided Music Editing and remixing framework based on latent diffusion models. Our framework fortifies the U-Net with multi-scale aggregation in order to maintain consistency before and after editing. In addition, we introduce chord progression matrix as condition information and incorporate it in the semantic space to improve melodic harmony while editing. For accommodating extended musical pieces, InstructME employs a chunk transformer, enabling it to discern long-term temporal dependencies within music sequences. We tested InstructME in instrument-editing, remixing, and multi-round editing. Both subjective and objective evaluations indicate that our proposed method significantly surpasses preceding systems in music quality, text relevance and harmony. Demo samples are available at https://musicedit.github.io/Comment: Demo samples are available at https://musicedit.github.io

arXiv.org e-Print Archive

Geometric Symmetry of Dielectric Antenna Influencing Light Absorption in Quantum-Sized Metal Nanocrystals: A Comparative Study

Author: Gretchen Hall
Kowsalya Devi Rasamani
Rafaela Makrypodi
Xinyan Dai
Yugang Sun
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

Silica nanoparticles, optically transparent in the visible spectral region, represent a class of dielectric antenna to tune the propagation and local field distribution of the visible light through surface scattering while the energy loss is minimized. The light scattering on the surface of silica nanoparticles include resonant scattering and random scattering that strongly depend on their geometry: spherical silica nanoparticles with the highest geometrical symmetry favors the light scattering resonances on the nanoparticle surfaces to promote resonant scattering while non-spherical silica nanoparticles mainly support random scattering. Both resonant scattering and random scattering of light on the silica nanoparticles are capable of enhancing the light absorption in quantum-sized metal nanocrystals attached to the surfaces of the silica nanoparticles. The contributions of resonant scattering and random scattering to the enhancement of light absorption have been compared and discussed. The understanding highlights the importance of the geometry of the silica nanoparticle antenna on the design and synthesis of composite materials for efficient light harvesting

Directory of Open Access Journals

Frontiers - Publisher Connector